CWig: compressed representation of Wiggle/BedGraph format

نویسندگان

  • Huy Hoang Do
  • Wing-Kin Sung
چکیده

MOTIVATION BigWig, a format to represent read density data, is one of the most popular data types. They can represent the peak intensity in ChIP-seq, the transcript expression in RNA-seq, the copy number variation in whole genome sequencing, etc. UCSC Encode project uses the bigWig format heavily for storage and visualization. Of 5.2 TB Encode hg19 database, 1.6 TB (31% of the total space) is used to store bigWig files. BigWig format not only saves a lot of space but also supports fast queries that are crucial for interactive analysis and browsing. In our benchmark, bigWig often has similar size to the gzipped raw data, while is still able to support ∼ 5000 random queries per second. RESULTS Although bigWig is good enough at the moment, both storage space and query time are expected to become limited when sequencing gets cheaper. This article describes a new method to store density data named CWig. The format uses on average one-third of the size of existing bigWig files and improves random query speed up to 100 times. AVAILABILITY AND IMPLEMENTATION http://genome.ddns.comp.nus.edu.sg/∼cwig.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Deblocking Joint Photographic Experts Group Compressed Images via Self-learning Sparse Representation

JPEG is one of the most widely used image compression method, but it causes annoying blocking artifacts at low bit-rates. Sparse representation is an efficient technique which can solve many inverse problems in image processing applications such as denoising and deblocking. In this paper, a post-processing method is proposed for reducing JPEG blocking effects via sparse representation. In this ...

متن کامل

Accelerating Magnetic Resonance Imaging through Compressed Sensing Theory in the Direction space-k

Magnetic Resonance Imaging (MRI) is a noninvasive imaging method widely used in medical diagnosis. Data in MRI are obtained line-by-line within the K-space, where there are usually a great number of such lines. For this reason, magnetic resonance imaging is slow. MRI can be accelerated through several methods such as parallel imaging and compressed sensing, where a fraction of the K-space lines...

متن کامل

The Genomedata format for storing large-scale functional genomics data

SUMMARY We present a format for efficient storage of multiple tracks of numeric data anchored to a genome. The format allows fast random access to hundreds of gigabytes of data, while retaining a small disk space footprint. We have also developed utilities to load data into this format. We show that retrieving data from this format is more than 2900 times faster than a naive approach using wigg...

متن کامل

مقایسه‌ ی کیفیت مستندات پرونده‌های پزشکی بیماران بستری در بیمارستان‌های عمومی دانشگاه علوم پزشکی ایران و تامین اجتماعی شهر تهران : 1386

Introduction: Quality of patients care is directly linked with medical documentation quality, because in all medical professions related to patient care, quality of decisions depends on information quality. Thus, in this study two main populations that offer medical care in country, Ministry of Health (MoH) and Social security Organization, were selected to measure access rate, and level of med...

متن کامل

Digipaper: A Versatile Color Document Image Representation

We describe a segmentation method and associated file format for storing images of color documents. We separate each page of the document into three layers, containing the background (usually one or more photographic images), the text, and the color of the text. Each of these layers has different properties, making it desirable to use different compression methods to represent the three layers....

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Bioinformatics

دوره 30 18  شماره 

صفحات  -

تاریخ انتشار 2014